A PAC algorithm in relative precision for bandit problem with costly sampling
نویسندگان
چکیده
This paper considers the problem of maximizing an expectation function over a finite set, or finite-arm bandit problem. We first propose naive stochastic algorithm for obtaining probably approximately correct (PAC) solution to this discrete optimization in relative precision, that is which solves up error smaller than prescribed tolerance, with high probability. also adaptive provides PAC-solution same guarantees. The outperforms mean complexity terms number generated samples and particularly well suited applications sampling cost.
منابع مشابه
the algorithm for solving the inverse numerical range problem
برد عددی ماتریس مربعی a را با w(a) نشان داده و به این صورت تعریف می کنیم w(a)={x8ax:x ?s1} ، که در آن s1 گوی واحد است. در سال 2009، راسل کاردن مساله برد عددی معکوس را به این صورت مطرح کرده است : برای نقطه z?w(a)، بردار x?s1 را به گونه ای می یابیم که z=x*ax، در این پایان نامه ، الگوریتمی برای حل مساله برد عددی معکوس ارانه می دهیم.
15 صفحه اولAlgorithm Selection as a Bandit Problem with Unbounded Losses
Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which a performance model is iteratively updated and used to guide selection on a sequence of problem instances. The resulting exploration-exploitation trade-off was represented as a...
متن کاملPAC Identification of a Bandit Arm Relative to a Reward Quantile
We propose a PAC formulation for identifying an arm in an n-armed bandit whose mean is within a fixed tolerance of the m highest mean. This setup generalises a previous formulation with m = 1, and differs from yet another one which requires m such arms to be identified. The key implication of our proposed approach is the ability to derive upper bounds on the sample complexity that depend on n/m...
متن کاملAnalysis of Thompson Sampling for the Multi-armed Bandit Problem
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W. R. Thompson, dates back to 1933. This algorithm, referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic idea is to choose an arm to pla...
متن کاملScalable Discrete Sampling as a Multi-Armed Bandit Problem
Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling also suffers from high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematical Methods of Operations Research
سال: 2022
ISSN: ['0042-0573', '1432-5217', '1432-2994']
DOI: https://doi.org/10.1007/s00186-022-00769-x